[Hardware][Powerpc]Enable prefix caching and chunked prefill for ppc64le#35081
Conversation
Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
There was a problem hiding this comment.
Code Review
This pull request enables chunked prefill and prefix caching for the PowerPC (ppc64le) architecture by removing the explicit architecture check in the engine configuration. The provided benchmark results demonstrate successful operation and performance improvements on this hardware. However, the refactoring is incomplete as the log messages within the associated code block still incorrectly list 'POWER' as an unsupported architecture, which will lead to misleading information for users on other platforms like s390x or RISC-V.
Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
|
Hi @bigPYJ1151 , Can you please take a look at this PR ? |
|
Hi @bigPYJ1151 , Can you please look at the changes ? |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Hi @Akashcodes732 There are some conflicts need to resolve :) |
Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
|
Hi @bigPYJ1151 , I have fixed the merge conflicts. |
|
Hi @bigPYJ1151 , The fails look unrelated to the fix, can you please suggest ? |
|
HI @bigPYJ1151 , I think you need to approve again :) |
…4le (vllm-project#35081) Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
Purpose
Removes the check for POWERPC in
vllm/engine/arg_utils.pyto enabled chunked prefill and prefix cachingTest Plan and Result
Server ran with Prefix Caching Enabled
vllm bench serve \ --backend openai \ --model ibm-granite/granite-3.3-8b-instruct \ --dataset-name prefix_repetition \ --num-prompts 100 \ --prefix-repetition-prefix-len 512 \ --prefix-repetition-suffix-len 128 \ --prefix-repetition-num-prefixes 5 \ --prefix-repetition-output-len 128Ran server with prefix caching disabled
Fixed Prompt with Prefix Caching
python benchmarks/benchmark_prefix_caching.py \ --model ibm-granite/granite-3.3-8b-instruct \ --enable-prefix-caching \ --num-prompts 1 \ --repeat-count 100 \ --input-length-range 128:256ShareGPT Dataset with Prefix Caching
python benchmarks/benchmark_prefix_caching.py \ --model ibm-granite/granite-3.3-8b-instruct \ --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json \ --enable-prefix-caching \ --num-prompts 20 \ --repeat-count 5 \ --input-length-range 128:256Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.